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Abstract 

Background: This study aims to suggest an approacli tliat integrates multilevel models and eigenvector spatial 
filtering methods and apply it to a case study of self-rated health status in South Korea. In many previous health-related 
studies, multilevel models and single-level spatial regression are used separately. However, the two methods should be 
used in conjunction because the objectives of both approaches are important in health-related analyses. The multilevel 
model enables the simultaneous analysis of both individual and neighborhood factors influencing health outcomes. 
However, the results of conventional multilevel models are potentially misleading when spatial dependency across 
neighborhoods exists. Spatial dependency in health-related data indicates that health outcomes in nearby neighborhoods 
are more similar to each other than those in distant neighborhoods. Spatial regression models can address this problem 
by modeling spatial dependency. This study explores the possibility of integrating a multilevel model and eigenvector 
spatial filtering, an advanced spatial regression for addressing spatial dependency in datasets. 

Methods: In this spatially filtered multilevel model, eigenvectors function as additional explanatory variables accounting 
for unexplained spatial dependency within the neighborhood-level error. The specification addresses the inability 
of conventional multilevel models to account for spatial dependency, and thereby, generates more robust outputs. 

Results: The findings show that sex, employment status, monthly household income, and perceived levels of stress are 
significantly associated with self-rated health status. Residents living in neighborhoods with low deprivation and a high 
doctor-to-resident ratio tend to report higher health status. The spatially filtered multilevel model provides unbiased 
estimations and improves the explanatory power of the model compared to conventional multilevel models although 
there are no changes in the signs of parameters and the significance levels between the two models in this case study. 

Conclusions: The integrated approach proposed in this paper is a useful tool for understanding the geographical 
distribution of self-rated health status within a multilevel framework. In future research, it would be useful to apply the 
spatially filtered multilevel model to other datasets in order to clarify the differences between the two models. It is 
anticipated that this integrated method will also out-perform conventional models when it is used in other contexts. 
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Background 

To analyze both effects of individual and neighborhood 
factors on individual health outcomes, many previous 
health-related studies utilized multilevel models that can 
analyze two- (or more) level independent variables in 
tandem [1-6]. These studies analyzed various health out- 
comes, such as infant mortality [1], a low birth weight 
[2], preterm birth [3], late-stage breast cancer [4], chil- 
dren's health-related quality of life [5], and tuberculosis 
incidence [6], using aggregated data in common, such 
as county-level, census tract-level, and postal code-level 
data to represent neighborhood-level variables. The stud- 
ies, however, do not take into account underlying spatial 
dependency across neighborhoods; thus their multilevel 
analyses results are potentially misleading in cases where 
data exhibit spatial dependency. Spatial dependency in 
health-related data indicates that health outcomes in 
nearby neighborhoods are more similar to each other 
than to those in distant neighborhoods. In other words, 
these studies only consider within-neighborhood correl- 
ation (i.e., correlation between individuals within the same 
neighborhood) using a hierarchical setting, but fail to ac- 
count for potential between-neighborhood correlation. 

According to Jerrett et al. [7], spatial dependency of 
health outcomes among nearby neighborhoods may arise 
from similar socioeconomic (e.g., health facilities and 
services) and natural environmental conditions (e.g., air 
quality). For example, catchment areas for health facil- 
ities may encompass a broader area, thereby transcend- 
ing localized administrative boundaries. In terms of 
local environment, disease risks from air pollution tend 
to be similar among closer neighborhoods because their 
local wind direction and/or road conditions (and environ- 
mental and traffic policies) are more likely to be similar; as 
a result, residents of those neighborhoods are exposed to 
similar types and concentrations of atmospheric pollutants 
[7-9]. However, the non-spatial multilevel model cannot 
address this spatial dependency because the method 
typically assumes that neighborhoods (i.e., spatial units) 
are statistically independent of each other [10]; thus 
multilevel models have been criticized as non-spatial 
and unrealistic [10-13]. 

Based on the notion of spatial dependency of health out- 
comes, some researchers used both a non-spatial single- 
level linear model ignoring spatial dependency (i.e., linear 
models estimated with ordinary least squares or weighted 
least squares) and a spatial autoregressive model (SAR) 
considering spatial dependency, and compared the two 
methods [9,14]. The authors found that non-spatial 
single-level models and the SAR models provided dif- 
ferent regression results depending on the presence of 
spatial dependency. These two studies, however, made 
limited attempts to model individual characteristics 
when using spatial models, because they used only 



aggregated variables. Studies that analyze health out- 
comes solely via aggregated data using a single-level 
spatial model cannot fully explain factors that truly 
influence individual health outcomes [15]. 

A few researchers have tried to incorporate a geo- 
graphical perspective into the multilevel setting in vari- 
ous ways to take into account both the multilevel 
framework and spatial effects. Some studies attempting 
to address spatial dependency in residuals of multilevel 
models employed spatial lag regression model specifica- 
tions [16,17]. In the spatial lag regression model, the 
spatial autoregressive parameter is denoted as p, which 
indicates the intensity of spatial dependency. Another 
study [18] used multilevel models with geographically 
weighted regression (GWR) developed by Fotheringham 
et al. [13] to consider a spatially varying relationship 
between neighborhood factors and obesity. GWR allows 
researchers to estimate varying regression parameters 
over space. However, in some cases, there can still be 
spatial dependency after GWR is used, although this 
method may mitigate spatial dependency by considering 
spatial variation to some degree; this can influence the 
regression results considerably. In addition, according to 
Wheeler and Tiefelsdorf [19], GWRs R^ goodness of fit 
tends to be high when residuals have high spatial de- 
pendency. Therefore, GWR should be used as an ex- 
ploratory tool for understanding spatial variation rather 
than a statistically stable method for addressing spatial 
dependency. 

As discussed above, limited attention has been paid 
within the literature to integrating multilevel models and 
spatial regression models. However, these two approaches 
should be used in combination because the objectives of 
both methods are important in health-related analyses. 
Thus, it is increasingly necessary to integrate multilevel 
models and spatial regression models, especially the eigen- 
vector spatial filtering method, an advanced approach to 
addressing spatial dependency in datasets. Compared to 
spatial lag regression (or SAR) model specifications, which 
present only one parameter of global spatial component, 
the greatest advantage of eigenvector spatial filtering used 
in this paper is to visualize a spatial structure in a map 
form by decomposing it into smaller-scale spatial patterns 
or local clusters with a set of eigenvectors [20,21]. This 
trait could provide a better understanding of how health 
phenomena are distributed across the space. Additionally, 
because the spatial filtering technique can be applied to a 
generalized linear model specification based on the bino- 
mial or Poisson probability models, it is more flexible than 
the spatial lag regression (or SAR) model, which re- 
quires normalizing factor computation [22]. Compared 
with GWR, which has an inherent problem of multi- 
coUinearity among local regression coefficients [19], the 
spatial filtering method is more statistically reliable because 
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eigenvectors generated in filtering procedure are mutually 
orthogonal, which indicates the absence of multicollinear- 
ity issues. 

Griffiths study [22] showed the possibility of combin- 
ing hierarchical generalized linear models with spatial 
filtering method as a disease mapping technique. Based 
on this idea, the present study presents how multilevel 
modeling components can be linked to the spatial filter- 
ing framework by showing an integrated formulation 
and uses self-rated health status in South Korea to inves- 
tigate whether an integrated "spatially filtered multilevel 
model" generates a more robust regression results than 
a conventional multilevel model. 

This study first identifies whether spatial dependency 
exists within neighborhood-level residuals in the multi- 
level model. Where spatial dependency is detected, the 
eigenvector spatial filtering technique is applied to the 
multilevel model to control for spatial dependency. The 
study then compares the explanatory power of the 
models and the regression results between the conven- 
tional model and the spatially filtered model. 

Methods 

Data and variables 

Data are obtained from the following sources: (1) the 
2009 Community Health Survey (CHS) of South Korea; 
(2) the e-Regional Indicators (2009) provided by Statis- 
tics Korea; and (3) the Korean Deprivation Index (KDI) 
designed by Yoon [23]. The CHS is a survey of health 
outcomes among adults aged 19 or older, conducted by 
the Korea Centers for Disease Control and Prevention. 
A dependent variable, EQ-5D index (EuroQol-5 Dimen- 
sion [24]), is obtained from the CHS. The EQ-5D index 
indicates one of the measures of self-rated health status. 
This index comprises five dimensions (mobility, self-care, 
usual activities, pain/discomfort, and anxiety/depression) 
that are measured by means of a three-point scale (no 
problems; some problems; extreme problems). Respon- 
dents are asked to assess their own health status by select- 
ing the most appropriate indicator for each dimension. 
Thus, based on these responses, a total of 3^ types of self- 
rated health status are produced. Each type has different 
EQ-5D values that enable researchers to compare health 
status between regions or countries [25,26]. A higher value 
indicates that a respondent perceives him/herself health- 
ier. Based on CHS' EQ-5D questionnaire responses, the 
study employs a weighted moder developed by Kang et al. 
[27] to calculate a Korean EQ-5D index. Table 1 provides 
descriptive statistics for the Korean EQ-5D index. In order 
to minimize the impact of variability in age distribution 
across the country, the study included individuals aged 60 
and older. From 61,817 respondents, the average of the 
Korean EQ-5D index is 0.783 and standard deviation is 
0.261 (range -0.229 to 1.0). 



Table 1 Descriptive statistics for a dependent variable 
and independent variables 





N 


% 




Individual-level variables (n = 61 81 7) 








Sex 








Males 


26116 


42.2 




Females 


35701 


57.8 




Employment status 








Employed 


24508 


39.7 




Unemployed 


37293 


60.3 




Perceived levels of stress 








High level 


13140 


21.3 




Low level 


48649 


78.7 






Mean 


Standard 
dev. 


Range 


Monthly household income (US$) 


1382.1 


1988.4 


0.0 - 99553.6 


Neighborhood-level variables 

(n = 223) 








Korean Deprivation Index (KDI) 


0.3 


0.9 


-1.5 - 1.7 


The number of doctors 
per 1000 people 


2.2 


2.0 


0.6 - 20.7 


Degree of the Local Governments' 
Financial Independence (LGFI) 


65.1 


9.5 


33.7 - 91.4 


Dependent variable 








EQ-5D index 


0.783 


0.261 


-0.229 -1.000 



To explore how self-rated health status varies across the 
study area, census tracts are classified into four quartiles 
depending on neighborhood-level EQ-5D values: "Very 
low" (first quartile: 0.675 - 0.756), "Low" (second quartile: 
0.757 - 0.787), "Average" (third quartile: 0.788 - 0.815), 
and "High or very high" (fourth quartile: 0.816 - 0.883). 
The values are visualized as a choropleth map (Figure 1). 

Figure 1 shows how self-rated health status is more 
similar to that in nearby neighborhood census tracts than 
that in distant neighborhoods. This is because nearby 
neighborhoods are likely to have similar demographic and 
socioeconomic characteristics (e.g., sex, age, race, income, 
language, and religion) and political resources within a lar- 
ger citywide system [28,29]. In South Korea, development 
policies have focused more on rapid economic growth 
than the distribution of accumulated wealth, resulting 
in serious regional disparities in health status across the 
country. For example, most districts in Seoul, Koreas 
largest metropolitan area, show high self-rated health 
status (Figure 1). This is because the Seoul metropolitan 
area has sufficiently dense infrastructure provision for a 
healthy environment to ensure good accessibility to 
health services [30]. In contrast, many provincial cities 
in non-metropolitan areas excluded from the benefits of 
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economic development, such as Gangwon, Chungnam, 
and Gyeongbuk show low self-rated health status. 

The CHS also provides individual-level variables such as 
sex, employment status, perceived levels of stress, and 
monthly household income. Among these, sex (0 = males; 
1 = females), employment status (0 = employed; 1 = un- 
employed), and perceived levels of stress (0 = people with 
high perceived levels of stress; 1 = people with low per- 
ceived levels of stress) are binary. Monthly household in- 
come is a continuous variable. Descriptive statistics for the 
independent variables are summarized in Table 1. 

The neighborhood-level variables consist of the KDI 
[23], the doctor- to-resident ratio (number of doctors per 
1,000 population), and the degree of the local govern- 
ments' financial independence (LGFI). The KDI is based 
on eight census indicators reflecting neighborhood so- 
cioeconomic levels, such as the proportions of house- 
holds that are: without car ownership; in a low social 



class; comprised of elderly people, etc. The number of 
doctors per 1,000 population and LGFI were obtained 
from e-Regional Indicators (2009). LGFI refers to the 
local governments level of autonomy to raise and use 
financial funds. This ability facilitates implementation of 
welfare policy, such as providing healthy residential en- 
vironment or enhancing health care services. The ratio 
of physicians to residents reflects accessibility to health 
care services. Descriptive statistics for neighborhood- 
level variables are provided in Table 1. 

Multilevel model 

When analyzing both individual and neighborhood va- 
riables in tandem, a multilevel model is generally more 
appropriate than an ordinary single-level regression mo- 
del because it enables researchers to deal with hierarch- 
ical structure of variables [31]. The multilevel model 
assumes that individuals (i.e., lower hierarchy) belonging 
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to a particular neighborhood (i.e., higher hierarchy) are 
not independent of each other because they are presumed 
to share the similar characteristics of that neighborhood; 
thus the model considers intra-neighborhood correlation. 

Model construction begins with analyzing a null' model, 
which is the simplest model and uses no independent vari- 
able. The null model includes distinct types of variance of 
the dependent variable, such as within-neighborhood and 
between-neighborhood variances [32]. Based on this null 
model, an Intra-class Correlation Coefficient (ICC) is 
calculated, which guides how the null model should be 
extended further. The ICC is the ratio between the 
between-neighborhood variance and the sum of both 
within-neighborhood and between-neighborhood variances. 
A high ICC indicates that between-neighborhood variance 
is not negligible, and thus a multilevel model should be 
employed to explain the inter-neighborhood dynamics. 

The null model is then extended to a more advanced 
multilevel model by adding independent variables at 
the individual- and neighborhood-levels. The two-level 
Equation 1 is expressed as follows [32]: 



Individual-level: Yij = + Py^ij + ^ij 
Neighborhood-level: = Yqq + y^^Zj + uoj.^y = y^^ + Uy 

(1) 

Here, Yij represents the value of the dependent variable 
of the / th individual in neighborhood while X^y and Zj 
indicate the independent variables at different levels. In 
other words, includes data about the individuals in 
neighborhood J; Zj contains data about the neighborhoods. 
1^0 J and are the individual-level intercept and slope, re- 
spectively, in neighborhood / rij indicates the error term 
at the individual-level (i.e., within-neighborhood variance). 
Yoo denotes the average of the dependent variable Yip con- 
trolling for the neighborhood-level variables Zp yoi is the 
slope of the neighborhood-level variables Zp and yio indi- 
cates the overall value of slope at the individual-level, con- 
trolling for the neighborhood-level variables Zp Lastly, Uqj 
and Uy are error terms at the neighborhood-level (i.e., 
between-neighborhood variance). In the framework of 
multilevel modeling, an intercept is assumed to be incon- 
sistent if the neighborhood averages of a dependent vari- 
able differ between neighborhoods. Similarly, when effects 
of independent variables on the dependent variable vary 
across neighborhoods, the slopes of each neighborhood 
are expected to deviate from their average. 

Eigenvector spatial filtering 

Proposed by Griffith [33], an eigenvector spatial filtering 
technique handles spatial dependency within ordinary 
single-level regression by utilizing a linear combination of 



eigenvectors. Eigenvectors function as synthetic explana- 
tory variables expressing underlying spatial structures of 
the regression model [20]. This method allows one to 
visualize local spatial clusters in a map form. Because eigen- 
vectors are always independent of each other, the associated 
spatial structures are thus regarded as being distinct. 

From the perspective of eigenvector spatial filtering, an 
ordinary single-level regression applied to spatial datasets 
consists of two parts: (1) a systematic trend explained by 
independent variables, and (2) unexplained random errors 
that are often spatially autocorrelated [34,35]. That is to 
say, the eigenvector spatial filtering technique can capture 
a spatial signal from unexplained random errors, which in 
turn reinforces the independence of the error term 
[35,36]. This is expressed numerically in Equation 2: 



Y = XyS + £* = + Ex + f 



(2) 



where refers to the systematic trend, while s is the 
n-hy-l spatially autocorrelated error vector. X denotes 
the n-hy-k data matrix (i.e., n number of observations 
and k number of independent variables); ^ indicates the 
/c-by-1 parameter vector corresponding to the independ- 
ent variables. Ey is the spatial signal captured by selected 
eigenvectors E. The dimension of E is n-hy-p (i.e., n num- 
ber of observations and p number of selected eigenvectors), 
and y is the p-hy-l parameter vector corresponding to the 
selected eigenvectors. Lastly, ^ is the n-hy-l spatially- 
independent error vector. 

When generating eigenvectors, two different spatial 
processes are considered: (1) simultaneous autoregres- 
sive (SAR); and (2) spatial lag [35]. These processes may 
generate different analytical results due to their differing 
model structures; for further details, see the study by 
Tiefelsdorf and Griffith [35]. The present study deals 
only with eigenvectors for the SAR process. 

Eigenvectors for the SAR process, {ei, e2, • en}sAR> 
are extracted from a transformed spatial weight matrix 
as follows: 



{ei,e2, "^enlsAR = evec 



M, 



(x)-(V + V^)M(x) 



(3) 



where a projection matrix M(x) = I - X(X^X)'^X^; I re- 
presents an n-hy-n identity matrix; X is an n-hy-k matrix 
including n number of objectives and k number of inde- 
pendent variables. A subset of {ei, e2, • • ChIsar is denoted 
by EsARj which contains only selected eigenvectors. This 
set of eigenvectors can be introduced in a model as spatial 
proxies to 'filter out' spatial dependency [35]. 

Eigenvectors are selected in a stepwise manner, and 
the selection procedure is repeated until the value of 
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Moran s (an indicator of a strength of spatial de- 
pendency) approaches a pre-determined threshold 
(e.g., |z(Morans 7)|<0.1). Each eigenvector, owing to 
their mutual orthogonality, shows its unique spatial pat- 
terns and different degrees of spatial dependency. The first 
selected eigenvector has the highest Moran s / value and 
therefore accounts for the largest proportion of the overall 
spatial dependency. The second eigenvector has the second- 
highest Morans / value, and is uncorrected with the first 
one [20]; similarly, the nth. eigenvector is considered to have 
the f2th-highest Morans / value, expressing the A2th-largest 
proportion of the spatial dependency. 

Spatially filtered multilevel model 

Equation 1 of the conventional multilevel model can be 
rearranged as follows: 

fixed effects random effects 

(4) 

Basically, this multilevel model can be divided into two 
parts, representing fixed effects (that are modeled in a 
multileveled manner), and random effects (that are unex- 
plained and often spatially autocor related). If this model is 
corrected by the eigenvector spatial filtering technique, 
the spatial signal can be introduced as follows: 

fixed effects 

/ ^ s 

^ij= (roo + roi^; + rio^^) 

Xfi {systematic trend) 

spatially autocorrelated random effects 
+ (ro + Yl^V + • • • + Yn^nj) + {u'oj + u'Xij + nj) 

' V ^ V 1 ^ 

^■^{spatial signal) ^ {white— noise) 

(5) 

This integrated model, entitled spatially filtered mul- 
tilevel model/ regards the fixed effects in the multilevel 
model as identical to the systematic trend Xy^ in the 
framework of eigenvector spatial filtering. In this mo- 
del, a linear combination of eigenvectors Ey is included 
as a spatial proxy to separate the spatial signal from 
the spatially autocorrelated random effects at the 
neighborhood-level (wq; + UyXij), leaving only a white- 
noise Uqj + UyXij within them. This filtering process 
results in unbiased regression results that improve the 
explanatory power of the model. 

All analyses are conducted in the R environment. The 
lme4' package [37] is used for the multilevel model run, and 
the spdep' package [38] is employed for the 'SpatialFiltering' 
function for the eigenvector spatial filtering. 



Results 

Results of the conventional multilevel model 

The null model finds that the variance at neighborhood- 
level is 2.3% (ICC = 0.023). This indicates that 2.3% of 
the total variance in self-rated health status arises from 
inter-neighborhood dynamics. Given that a health out- 
come itself is generally influenced more by individual 
factors than by neighborhood characteristics, it is rea- 
sonable that variance at individual- level is much larger 
than that at neighborhood-level. The 2.3% variance at 
neighborhood-level should be regarded with some cau- 
tion, because Kreft and de Leeuw pointed out that for a 
sufficiently large number of samples, even a small ICC 
(for example, 1%) could considerably affect the degree of 
significance [31]. 

To identif)^ the effects of independent variables on indi- 
vidual health status, the individual-level model (hereafter. 
Level- 1 model) is then designed by adding individual-level 
variables to the null model. An intercept for each inde- 
pendent variable in this study is assumed to be random 
across the study area. Except for the slope for the monthly 
household income variable, a slope for each independent 
variable is regarded as fixed for simplicity of modeling. As 
shown in Table 2, the Level- 1 model yields much lower 
Akaike Information Criterion (AIC) compared to the null 
model, indicating a better model fit [39]. All individual- 
level variables (sex, employment status, perceived levels 
of stress, and monthly household income) are signifi- 
cantly associated with individual self-rated health status. 
These variables are found to account for 22% of 
variance at individual-level and 31% of variance at 
neighborhood-level. The reason why the Level- 1 model 
partially explains variance at neighborhood level— des- 
pite it not including neighborhood-level variables— is 
that regression analyses are performed separately for 
each neighborhood. 

For the next step, both individual -level and neighbor 
hood-level variables are added together in the neighbor 
hood-level model (hereafter, Level-2 model). By introdu- 
cing neighborhood-level variables, a further 2% of vari- 
ance at neighborhood-level is explained compared with 
the Level- 1 model. This suggests that neighborhood fac- 
tors explicitly influence the individuals' self-rated health 
status. The Level-2 model shows the lowest AIC and 
the highest explanatory power among the three models. 
Like the Level- 1 model, all individual-level variables 
remain significant (p < 0.001). Of the three neighbor 
hood-level variables, only two variables, KDI and the 
doctor-to-resident ratio, are statistically significant 
(p < 0.05) (Table 2). 

Results of applying eigenvector spatial filtering 

Before applying the eigenvector spatial filtering method, 
we tested for spatial dependency between neighborhood- 
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Table 2 Estimation results for the conventional multilevel model and the spatially filtered multilevel model 



Variables Null Level-1 multilevel Level-2 multilevel Spatially filtered 

model model model multilevel model 



Individual-level variables 










Sex (male:0; female:!) 


- 


- 49.88*** 


- 49.65*** 


- 49.69*** 


Monthly household income 


- 


0.!0*** 


0.!0*** 


0.09*** 


Employment status (employed:0; unemployed:!) 


- 


-!34.!0*** 


-! 34.90*** 


-! 35.30*** 


Perceived levels of stress (high:0; low:!) 


- 


! 54.60*** 


! 55.60*** 


! 55.70*** 


Neighborhood-level variables 










Korean Deprivation Index (KDI) 


- 


- 


-23.82* 


-!5.5!* 


The number of doctors per !000 people 


- 


- 


4.85* 


2.60* 


Degree of the Local Governments' Financial Independence (LGFI) 


- 


- 


0.98 


0.!6 


Random effects 










Variance at individual-level 


66725 


52226 


52225 


560! 3 


Between monthly household income variance 




0.0039 


0.0036 


0.00! ! 


Variance at neighborhood-level 


!59! 


!!02 


!062 


555 


Constant 


785.3!*** 


76!. 40*** 


747.00*** 


770.70*** 


Eigenvector selection 








8 eigenvectors 


Moran's / of neighborhood-level residuals 






0.!0!* 


0.005 


AlC 


86! 942 


830665 


830650 


830549 


Log-likelihood 


- 447328 


-4! 5324 


-4!53!4 


-4! 5254 



•p < =0.001 *p < =0.05. 



level residuals in the multilevel model and found this to 
be significant (Morans 7=0.101; p < 0.05). Hence, it is 
necessary to eliminate this spatial dependency by apply- 
ing the eigenvector spatial filtering. 

Eigenvectors in this study are extracted from a trans- 
formed spatial weight matrix based on topological adja- 
cency, so-called a "Queen" criterion— if two areas share 
a boundary or a vertex, the entity of the spatial weight 
matrix is coded as 1, and otherwise, 0. As an eigen- 
vector selection algorithm, this study uses a Morans I 
minimization scheme [35]. 

Figure 2 shows that by adding eigenvectors to the 
model, the degree of spatial dependency becomes reduced 
to the threshold (|z(Morans I) \ < 0.1). This is because se- 
lected eigenvectors explain spatial dependency as synthetic 
variables. A group of 8 eigenvectors (en, e^, e-y, e^, eyj, ^23, 
and 629) are finally selected. The first selected eigen- 
vector explains the greatest proportion of spatial de- 
pendency (Figure 2). 

Selected eigenvectors are illustrated in Figure 3, all of 
which portray positive spatial dependency patterns. The 
first four eigenvectors exhibit explicit local clusters re- 
lated to positive spatial dependency across the study 
area. Given that the first sequenced eigenvector repre- 
sents more noticeable cluster than those later in the 
series, e^x displays the most prominent local cluster 
pattern, as shown in Figure 3-A. 



Discussion 

The spatially filtered multilevel model presents unbiased 
regression results and yields a lower AIC than the con- 
ventional multilevel model. Both analyses present similar 
regression parameters and the same parameter signs 
(Table 2). In this study, addressing spatial dependency 
has little effect on the fixed effects, whereas it improves 
the independence of the random effects. With eigenvector 
spatial filtering, the Morans 1 of the neighborhood-level 
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residual declines from 0.101 to 0.005 and becomes non- 
significant (p = 0.824). 

According to the regression results, self-rated health 
status is significantly higher for respondents meeting the 
following conditions: male; employed; higher monthly 
household income; lower stress level; living in a neigh- 
borhood with lower KDI and proportionally more physi- 
cians. These findings are similar to those of previous 
studies [23,40-45]. For the doctor- to-resident ratio vari- 
able, however, Matteson et al. reported conversely that 
counties with more family practitioners per capita have 
higher infant mortality [1]. However, they also found 
that more hospital beds per capita predicted lower risk 
of infant death. These results are somewhat contradict- 
ory because it is generally considered that the numbers 
of physicians and hospital beds tend to have a strong 
positive relationship [46,47]. There does not appear to 
be a clear and consistent effect of the doctor-to-resident 
ratio on individual health outcomes; further studies are 
therefore needed. The present study finds no significant 



relationship between health status and LGFI, whereas 
some previous domestic studies reported positive rela- 
tionship between LGFI and health outcomes [48,49]. 

This study has several limitations that should be con- 
sidered in future research. First, even after introducing 
neighborhood-level variables into the model, variance at 
neighborhood-level still remains. This may be because 
some of the key determinants of self-rated health status 
are omitted. In future research, other neighborhood 
socioeconomic and environmental factors should be 
considered to explain the remaining variance. For envir- 
onmental factors such as air pollution, it is possible to 
use the interpolated map data in multilevel modeling by 
integrating it with survey datasets via geographic infor- 
mation science (GIS) [50]. Second, given that the re- 
spondents in this study are elderly (aged 60 and over), 
the employment status variable used in this study can be 
problematic, because people in their 70s or older are 
more likely to retire than people in their 60s. In other 
words, it is possible that the regression result could be 
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confounded by an age' factor. Third, although census 
tract data are the only viable option in this study, it 
could be unclear whether census tracts accurately 
represent the geographical areas where health-related 
activities actually occur [21,51]. If they do not, then the 
estimation of neighborhood effects via these adminis- 
trative units would be unclear. Due to human mobility, 
individual health outcomes may be influenced by more 
complex geographical and temporal contexts beyond 
their residential environment [52]. However, it is ac- 
tually difficult to delineate these complex contexts 
because there is a lack of spatial and temporal informa- 
tion in many cases [51]. Kwan defined this as the un- 
certain geographic context problem (UGCoP) [51]. To 
obtain more reaUstic results, future studies should at- 
tempt to identify the actual contexts influencing indi- 
vidual health and mitigate UGCoP. Lastly, some recent 
studies notice that an approach of removing spatial de- 
pendency should practice caution in some cases where 
neighborhood characteristics change abruptly across a 
study area. Some researchers have begun to examine 
this issue; so it must be left to future research. 

Conclusion 

This study explores the effects of individual- and neigh 
borhood-level factors on self-rated health status of 
people over the age of 60 via an approach that combines 
a multilevel model and an eigenvector spatial filtering 
technique. The findings show that sex, employment sta- 
tus, monthly household income, and perceived levels of 
stress are significantly associated with self-rated health 
status. In addition, residents living in neighborhoods 
with low deprivation and a high doctor-to-resident 
ratio tend to report higher health status. There are no 
changes in the signs of parameters or the significance 
level between the two models used in this case study. 
Nevertheless, the proposed spatially filtered multilevel 
model provides unbiased and robust estimations and 
has greater explanatory power than conventional multi- 
level models. The spatially filtered approach is a useful 
tool for understanding the spatial dynamics of self- 
rated health status within a multilevel framework. In 
future research, it would be useful to apply the spatially 
filtered multilevel model to other datasets in order to 
clarify the differences between the two models. The in- 
herent modeling complexities of the eigenvector spatial 
filtering method mean this approach has only recently 
been put to practical use despite its advantage of visual- 
izing underlying spatial structures. This study hopes 
that applied models using the eigenvector spatial filter- 
ing might be developed in many future studies. Finally, 
it is hoped that the present findings might inform pol- 
icy interventions to mitigate health inequality in South 
Korea. 



Endnote 

^See the study by Kang et al. [27] . 
'^Morans /, developed by Moran [53], is calculated as 
follows: 

where n is the number of spatial units; yi and yj are attri- 
bute values at spatial units / and ;; y is the average of y; 
and Vij is an entity of a spatial weight matrix. If attribute 
values at / and ; are both higher (or both lower) than the 
average, Moran s / is a positive value between 0 and 1. 
When the Moran s / is 1, the attribute values of / and ; 
are assumed to be perfectly correlated. On the other 
hand, if the attribute value at / is higher than the aver- 
age, but the value at ; is lower than the average, the 
Moran s / is negative. If attribute values of spatial units 
are perfectly dispersed, Moran s I is -1. A Moran s / of 
zero indicates that there is no spatial dependency and 
thus observations are randomly distributed. 
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